Dyr og Data

Data visualisation — scales and guides

Gavin Simpson

Aarhus University

Mona Larsen

Aarhus University

2024-09-12

Introduction

In this section, we’ll look at

  • scales
  • guides

Data

In this video we”ll use the penguins data set from the palmerpenguins 📦

We”ll also make use of the GB bovine TB data set

library("palmerpenguins")
library("ggplot2")
library("dplyr")
library("readxl")

bovine <- read_xlsx("data/bovine-tb/gb-tb-stats.xlsx") |>
  mutate(date = as.Date(date), year = format(date, "%Y"),
    doy = as.numeric(format(date, "%j"))) |>
  rename(n_cases = n_not_otf)

penguin_labs <- labs(x = "Bill length (mm)", y = "Flipper length(mm)")

Scales, guides, and themes

  • Scale Every aesthetic has a scale — if you want to adjust the scale use a scale_ function
  • Guide Many scales are linked with a legend or a guide — if you want to adjust these use the guides() function

Scales

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases")

Scales

For continuous data the defauls scale is scale_AES_continuous()

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases") +
  scale_y_continuous()

Scales

To modify the scale we shoose a different one

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases") +
  scale_y_log10(breaks = scales::breaks_log(), labels = scales::label_log())

Scales

Can provide own labels and breaks or use scales 📦

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases") +
  scale_y_log10(breaks = scales::breaks_log(), labels = scales::label_log()) + 
  scale_x_discrete(labels = LETTERS[1:4])

Continuous colour scales

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)")

Continuous colour scales

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c() #<<

Continuous colour scales

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c(option = "plasma") #<<

Scales

Each type of aesthetic (chanel) has an associated scale

scale_AES_KIND()

scale_x_continuous() for a continous x-axis scale

scale_y_discrete() for a discrete scale

Scales can transform data — the trans argument allows for many popular/common transformations

Scales apply transformation before any statistical transformation / calculation

Scales and transforms

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases") +
  scale_y_continuous(trans = "log10", labels = scales::comma)

Scales and transforms

If you want to apply a transformation after the stat use coord_trans()

bovine |>
ggplot(aes(x = country, y = n_cases)) + geom_boxplot() +
  labs(x = NULL, y = "Number of bovine TB cases") +
  scale_y_continuous(labels = scales::comma) +
  coord_trans(y = "log10")

Scales and breaks

Manually set the breaks through the breaks argument

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c(option = "plasma") +
    scale_x_continuous(breaks = seq(30, 60, by = 5))

Scales and names

The scale allows you to set the name for the scale — affects the title of the guide

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm, colour = species)) +
    geom_point() +
    scale_color_brewer(
      palette = "Set1",
      name = "Species",
      labels = c("adelie", "chinstap", "gentoo")) +
    penguin_labs

Guides — fine control

Guides offer finer control over legends

Add to the plot using the guides() function

guide_legend() is for discrete legends

guide_colourbar() is for continuous colour ranges

Only needed if you want to control the guide beyond what the defaults give you

Avoid repeating code…

Compute and store the total number of bovine TB cases by country in total_tb

total_tb <- bovine |>
  filter(year == "2020") |>
  group_by(country) |>
  summarise(total_cases = sum(n_cases))

Guides — fine control

Turn off the guide if it is redundant

total_tb |>
ggplot(aes(y = total_cases, x = country, fill = country)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  labs(y = "Total cases", x = NULL, title = "Cases of bovine TB in 2020") +
  guides(fill = "none")

Guides — fine control

Can control the layout

total_tb |>
ggplot(aes(y = total_cases, x = country, fill = country)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  labs(y = "Total cases", x = NULL, title = "Cases of bovine TB in 2020") +
  guides(fill = guide_legend(ncol = 2, byrow = TRUE, title = "Country"))

Guides — fine control

total_tb |>
ggplot(aes(y = total_cases, x = country, fill = country)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  labs(y = "Total cases", x = NULL, title = "Cases of bovine TB in 2020") +
  guides(fill = guide_legend(reverse = TRUE))

guide_colourbar()

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c(option = "plasma") +
    guides(colour = guide_colourbar(reverse = FALSE))

guide_colourbar()

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c(option = "plasma") +
    guides(colour = guide_colourbar(reverse = TRUE))

guide_colourbar()

penguins |>
  ggplot(aes(x = bill_length_mm, y = flipper_length_mm)) +
    geom_point(mapping = aes(colour = body_mass_g)) +
    penguin_labs + labs(colour = "Body mass (g)") +
    scale_colour_viridis_c(option = "plasma") +
    guides(colour = guide_colourbar(barheight = unit(4, 'cm')))